Column Stores for Wide and Sparse Data
نویسنده
چکیده
While it is generally accepted that data warehouses and OLAP workloads are excellent applications for column-stores, this paper speculates that column-stores may well be suited for additional applications. In particular we observe that column-stores do not see a performance degradation when storing extremely wide tables, and column-stores handle sparse data very well. These two properties lead us to conjecture that column-stores may be good storage layers for Semantic Web data, XML data, and data with GEM-style schemas.
منابع مشابه
Chapter 1: Background
• A representation for sparse data. Consider attributes about an employee, and suppose we wish to record hobbies data. For each hobby, the data we record will be different and hobbies are fundamentally sparse. This is straightforward to model in a relational DBMS but it leads to very wide, very sparse tables. This is disasterous for disk-based row stores but works fine in column stores. In the ...
متن کاملDatabase System Support of Simulation Data
Supported by increasingly efficient HPC infra-structure, numerical simulations are rapidly expanding to fields such as oil and gas, medicine and meteorology. As simulations become more precise and cover longer periods of time, they may produce files with terabytes of data that need to be efficiently analyzed. In this paper, we investigate techniques for managing such data using an array DBMS. W...
متن کاملData Compression in Database Query Processing
Row-oriented databases (or “row-store”) employ data compression methods (like dictionary encoding) to reduce the I/O cost by decreasing the data sizes. However, there are two limitations on row-stores when applying data compression schemes: (1) row-stores only allow encoding one single value at a time, and (2) they have to pay the decompression cost in query processing. The above shortcomings l...
متن کاملReducing Overhead in Sparse Hypermatrix Cholesky Factorization
The sparse hypermatrix storage scheme produces a recursive 2D partitioning of a sparse matrix. Data subblocks are stored as dense matrices. Since we are dealing with sparse matrices some zeros can be stored in those dense blocks. The overhead introduced by the operations on zeros can become really large and considerably degrade performance. In this paper, we present several techniques for reduc...
متن کاملAlgorithm 8xx: a concise sparse Cholesky factorization package
The LDL software package is a set of short, concise routines for factorizing symmetric positive-definite sparse matrices, with some applicability to symmetric indefinite matrices. Its primary purpose is to illustrate much of the basic theory of sparse matrix algorithms in as concise a code as possible, including an elegant method of sparse symmetric factorization that computes the factorization...
متن کامل